What are the Philosophers Talking About?

Install and load libraries

The following functions and packages are used in the whole projects.

Introduction

A literal study of philosophical works is necessary. Philosophical language is very abstract, but if you extract words from text and then observe and study them, you can draw many interesting conclusions. The philosophical issues studied by different schools and the philosophical fields studied by different philosophers may have commonalities as well as differences.

The main focus of this article is to see if there are some interesing overlaps of topics between schools or not, using topic modeling. Next we see what exactly they are talking about with detailed sentences.

Part 1 Data Preprocessing

Import data from Kaggle: History of Philosophy (https://www.kaggle.com/kouroshalizadeh/history-of-philosophy) and take a look at the dimention and structure of dataset.

As we may see, the dataset contains 360808 rows and 11 columns. Variables $original\_publication\_date$, $corpus\_edition\_date$, $sentence\_length$ are integer, while the rest of variables are object. No null values are detected. Moreover, the dataset contains 59 different books written by 36 authors from 13 distinct schools.

More information can be extracted after NLP for variable $tokenized\_txt$ by eliminating stop words and lemmatize sentences using function 'lemmatized_sentence'. The lemmatized sentences are stored in variable $lemmatized\_str$ and the lengths for those sentences are stored in variable $lemmatized\_str\_len$.

Part 2 EDA

To process exploration for data, take a brief view over the data. I will only pick variables $title$, $author$, $school$, $original\_publication\_date$, $sentence\_length$, $sentence\_lowered$, $tokenized\_txt$, for exploratory data analysis.

Observations

Timeline figure and more insights

This is the timeline showing the amount of works classified by schools at different time spots.

It starts from early 350 B.C. and lasts to late 20th century. The data does not contain much works for the medieval peroid. During Renaissance of the 15th and 16th centuries heralded the beginning of the modern period, a lot more schools took place by the various colors showing up in the figure.

As we may also observe, in 1888, 3 books from nietzsche were published. Nietzsche was productive at that specific year. From certain color continuous showing up on the timeline, we can also notice that there were obvious trends for some schools to be popular for a period of time. For instance, from 1781 to 1820, German_idealism had been continuously publishing books. Similarly, Analytics showed the first work in 1910 and kept showing up from time to time, even till year 1985.

Part 3 Topic Modeling

Create Topic Models

Take a look at most frequent words over whole sentences.

From the bar plot, the highest count is given to 'things' and 'man', this makes sense since man and things appear commonly in sentences as a general word. We can find out that 'nature', 'world', 'sense', 'body' are things they care about and maybe they always bring up questions like, 'What is good?', 'Which one is true?', 'Does that make sense?'.

pyLDAvis Visualization

Next I use the package pyLDAvis to visualize the LDA model. Some points should be clearly explained:

For the left panel,

For the right panel,

As is shown above, there are 8 distinct topics which do not overlapped with other circles, and 4 clusters with different topics which might have similar philosophy opinions or questions. Due to limited time, I will only explore the topics which are overlapped and might have similarity research topics since those are what I am caring about.

So start discussing topics from bottom to top, from right to left.

First is a cluster which includes topic 4, 6, and 18. topic 4 has top words like 'place', 'water', 'animals', 'earth', which are closely related to nature. Topic 6 contains words like 'value', 'labor', 'money' and 'capital' which discusses things on a more social level, these are more correlated to human behaviors. Not surprisingly, topic 18 which overlapped largely with topic 6 contains words like 'woman', 'social', 'desire', 'production', which mostly involve values and labor. Maybe the questions around those three topics would be like the social distribution of natural resources and the rational value distribution of human resources.

Next is another cluster containing topic 1, 9, 17. Topic 9 has top words like 'state', 'power', 'laws' and 'government', which to me are highly correlated to politics. And topic 17 has words like 'action', 'employment', 'investment', and even 'violence'. The similarity between these two topics might be more about the balance between political power and social class supply and demand. Topic 1 has words 'body', 'spirits' and words related to foods, can be reasonably explained with supply and demand.

Move to right cluster with topic 5, 8, 2 and 20. Topic 20 says a lot about time and space while topic 5 talks more about existance, mind and soul. Topic 8 discuss more on sense, life and meaning. Those 3 topics might bring up ideas about explorations on the meaning of life in the dimension of time and space. Topic 2 has words like 'concept', 'consciousness' and 'experience', and may combine topic 8 with topics around exploring life on the basis of self-ideology.

Last, the cluster on the bottom right consists of topic 12 and 16. Topic 12 has words like 'true', 'false', 'argument', 'proof', 'method' and words like 'reason', 'sense', 'doubt' and 'explanation' are coming from topic 16. These two topics seem to discuss things like how to conduct effective dialectics and how to judge or support personal thinkings.

Get into specific sentence with top words

It seems that the production methonds are heatedly discussed in topic 6 when looking into the word 'labour'. And the word 'class' from topic 18 always associated with 'work', which is also related to 'labour'. So indeed, we can find some similarities in topic 6 and 18 and the hypothesis made in the former section hence makes sense.

Topic distrubution - Bar plot

Overall, the LDA model can create a matrix with rows representing proportions of each sentence assigned to each topic based on the words it contains. The rows can be treated as a topic distribution where a dominant topic can be assigned per sentence. Below is the distribution and the result for dominant topic I mentioned.

By the above table, I can use groupby to see if there are some interesting observations to obtain.

From above barplot, topic 5, 2, 8, 12, 6 all have great amount of sentences. And from the result of LDA visualization, we observe that topic 2, 5, 8 are from the same cluster, which makes sense since it is heatedly focused amoung philosophers.

From the bar plot above, we see that each topic has sentences from different schools, some topics are dominated by particular schools while some have evenly distrubuted proportions for schools. What I most care about are topic 5, 2, 8, 12, 6 and the reason is that they have greater amount of sentences to analyze and also topic 5, 2, 8 are from the same cluster.

In topic 5, school Aristotle, Empiricism and Rationalism have similar proportions of sentences in that topic, it is also true for Continental and Phenomenology from topic 8, Communism and Capitalism from topic 6.

For topic 2 and 12, each has a dominant school, German_idealism and Analytic.

We may notice that there are correlation between this bar plot and the previous one. Capitalism and Communism both are highly interested in topic 6. Empiricism and Rationalism have preference for topic 5 while Phenomenology and Continental have preference for topic 8.

Wordclouds visualization - based on interests from previous analysis.

From the wordcloud above, we notice that there are certain connections in topics between two schools. Capitalism talks a lot about labour, proportion, produce, purchase, consume, supply, which makes sense since capitalism's main idea is production of goods and services is based on supply and demand in the general market. However, Communism focuses on society, commodity, immense, unit, wealth, single, which is also reasonable since the main theory is that all property is publicly owned and each person works and is paid according to their abilities and needs. Therefore, it can be illuminated that the main difference between those two schools would be the resources or means of production.

From previous personal background knowledge, I treated Empiricism and Rationalism as two schools holding opposite ideas since empiricism always believe in sense, perception but Rationalism always holds on to inner ideas. As can be verified, Rationalism has top words like 'body', 'thought', 'god', 'self' while 'reader', 'paper', 'place' appear more in Empiricism .

Since Phenomenology has words like 'think', 'remain', 'truth', 'answer', and Continental has words like 'body', 'natural', 'space', 'define', they may be talking about similar opinions involving existences and experiences.

Interactive wordcloud visualizations ( just for a quick look)

Below gives us an interaction over wordclouds and the maximum of words of wordclouds can be choosen as 20, 50, 100 and 150.

word2vec visualizations ( for specific words visualization)

Use word2vec to group the vectors of similar words together in vectorspace from couple topics above in which more patterns may be discerned.

As is shown above, produce, power, labour are pretty close, rent and tax are close. These topics might discuss a lot about production, and money.

Here, nature, history, philosophy, science are close; function, structure, analysis are close; consciousness, perception, idea, knowledge are close. These two schools may be talking about analysis of functions and also personal experiences.

From the figure, existence, being are close; sense, cause are close; human, thought, nature are close. These two might be focusing on answering questions like meaning in existance, and human thoughts.

Part 4 Sentiment Analysis ( Basic application)

Here, I only want to analyze if two schools' topics are highly correlated, will the sentiment be the same since they can disagree or agree with the same topic? Schools which I think have topics overlapped:

From the stacked bar plot I conclude that Capitalism and Communism are not negative, but Capitalism talks more positively than Communism. Phenomenology talks less negatively than Continental. Empiricism and Rationalism have equally distribution on sentiments.

Conclusion

By applying topic modeling, even if we are outsiders of philosophy, we can quickly grasp some patterns. By artificially setting 20 topics, words can be captured and categorized into corresponding topics, and you can observe which topics are overlapping and which are independent topics, and even know how different they are (through the distance). Some words such as nature, man, society, etc., have always been the focus of philosophers.

Then, by giving each sentence a main topic, we can observe the proportion of each school in the topics, and we can also observe the proportion of each topic in the schools, so that we can know which schools focus on similar topics and which schools are unique.

Finally we proceed sentiment analysis. This part is relatively brief, but also quite interesting. We can see that although some schools have opposing views, their attitudes are all positive, and some schools have the same views, but there are always more negative or more positive ones.

More algorithms for machine learning are what I want to continue to study, but there is not enough time. . .